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, Abstract. When performing maximum-likelihood quantum-state tomography, one 

must find the quantum state that maximizes the likelihood of the state given observed 
i-^ . measurements on identically prepared systems. The optimization is usually performed 

with iterative algorithms. This paper provides a gradient-based upper bound on 
the ratio of the true maximum likelihood and the likelihood of the state of the 
& ■ current iteration, regardless of the particular algorithm used. This bound is useful 

for formulating stopping rules for halting iterations of maximization algorithms. We 
discuss such stopping rules in the context of determining confidence regions from log- 
likelihood differences when the differences are approximately chi-squared distributed. 
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. 1. Introduction 
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Quantum-state tomography is a statistical procedure for estimating the quantum state 
Ptrue °f a quantum system. One prepares many identical copies of the system, measures 
each copy independently and uses the measurement results to estimate p t ruc- One useful 
way to make the estimate is to find the state pml with the maximum likelihood for 
the measurement results pQ. The estimation then becomes an optimization problem, 
which is usually solved numerically with iterative methods. A stopping rule is needed 
to decide when to halt the iterations, otherwise one risks stopping before the calculation 
has reached a point 'near enough' to pml or wasting time with unnecessary iterations. 
Using the difference between the state or the likelihood achieved in successive iterations 
is unreliable, especially if the maximization algorithm suffers from slow convergence. 
Ideally, the stopping rule should specify 'near enough' in a statistically relevant way: 
if there is large statistical uncertainty, high numerical precision may not be necessary. 
In this paper we give such stopping rules that depend on an upper bound on the ratio 
of the true maximum likelihood and the likelihood of the state achieved at the current 
iteration. The bound and stopping rules can be applied to any iterative likelihood 
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maximization algorithm. The bound is particularly useful when a likelihood ratio is 
used to assign uncertainties to the inferred state. 

This paper begins with the derivation of a bound on the ratio of the true maximum 
likelihood to the likelihood of a particular state. One may stop iterations when this 
ratio is sufficiently small. We next give a brief review of Wilks's Theorem, which gives 
the probability distribution for the likelihood ratio statistic. We use this theorem to 
give some rules-of-thumb for when one should halt iterations, in three contexts: (1) 
Point estimation. The goal is to obtain a point estimate for which the likelihood of the 
estimate is at least as large as the expectation value of the likelihood of the true state 
with respect to the data. (2) Confidence regions for states. Here the goal is to construct 
a confidence region for the true state at a given significance level. (3) Confidence 
regions for expectation values. In this case, we wish to construct a confidence region 
for expectation values of observables of the state. Our stopping criteria are formulated 
using Wilks's Theorem, which may not always apply in quantum state tomography 
experiments. We then present a numerical example of our likelihood ratio bound using 
simulated optical homodyne measurements. 

2. Likelihood ratio bound 

Suppose N quantum systems are prepared, each in the state with density matrix p truc . 
For each copy i, the experimenter chooses an observable to measure. We will label each 
observable with 9i for i — 1 . . . N. The measurements yield results Xj. Corresponding to 
the choice and result combination, there is a positive-operator-valued-measure element 
U(xi\8i) = Ilj. For finite-dimensional systems, N may exceed the number of possible 
measurement choice and result combinations, so many of the IT will equal one another. 
However, for infinite-dimensional systems, the possible measurement results may be 
infinite and continuous, so the may all be different. The probability to observe x 
when measuring 9 is p(x\9) = Tr [p tme n The likelihood for observing the sequence 

of measurement results {x^ : % — 1 . . . N} as a function of candidate density matrix p is 

N 

C(p) = H Tr (U iP ) . 

i=l 

The goal of maximum-likelihood quantum-state tomography is to maximize this function 
to obtain the state P ml as the estimate of the true state ptrue- The optimization is easier 
if we focus instead on the natural logarithm of the likelihood (the 'log-likelihood'): 

JV 

L(p)=ln£(p)=X>r&CM]. 

i=l 

The same density matrix maximizes both the likelihood and the log-likelihood. 
Fortunately, the log-likelihood is concave over the (convex) set of density matrices. 
One can show that it is concave by using the concavity of the logarithm, the linear 
dependence of the individual event probabilities on the density matrix, and the fact 
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that the total log-likelihood is a positive sum of logarithms of the probabilities. This 
concavity simplifies the maximization. Several maximization methods are described in 
Refs. [U El [3J H] . These methods use iterative schemes, producing a new density matrix 
Pk after the fc'th iteration. 

After iteration k, we would like to place an upper bound on L(pml) — L(pk). 
Consider the density matrix p e = (1 — e)pk + epML, where < e < 1. Because the 
log-likelihood is concave, for any choice of e 

L( Pe ) - L(p h ) < e ^ 

In particular, when e = 1, 

L(pml) - L(pk) < 



de 



e=0 



dL(p e 
de 



e=0 



The derivative evaluated at e 
dL(p e 



de 



where 



<E=0 



N 



R(Pk) = Yl 



is 

Tr\p ML R(p k )]-N, 

a 



^ Tr(p fc nj) 

This is the same matrix R that is used in the 'RpR' algorithm described in [3]. Of 
course, we do not know p ML , so we find an upper bound of Tr[p ML -R (/)&)] by maximizing 
Tr [aR(pk)] over all density matrices a: 

L(pml) ~ L(p k ) < maxTr[cr J R(p fc )] - N. 

a 

This maximum is achieved for a equal to the pure density matrix corresponding to the 
eigenstate of R(pk) with the largest eigenvalue. Thus 

L(pml) ~ L(p k ) < r(p fc ), 

where r(p k ) = r k = max{eig[i?(pfc)]} — N. After exponentiation, we obtain, 

£{Pml) 



Thus one may stop iterations when r k is less than a predetermined bound specified by 
a stopping rule. Specific bounds depend on context as we discuss below. 

The above ideas could also be adapted for a simple gradient- ascent maximization 
procedure, as follows: Initialize the procedure with some state po, perhaps the fully 
mixed state. At each iteration, set a equal to the eigenstate of R(pt) with the largest 
eigenvalue. Then use a one- dimensional optimization procedure to find the e maximizing 
L(p e ) and set p k+1 = p t . However, such a procedure can have slow convergence because 
it uses only the slope of the log-likelihood function and not its curvature. 
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3. Review of Wilks's theorem 

Of course, the stopping rule, that is the value of r k below which one can halt iterations, 
depends on how the estimate is used. In the following we discuss the use of L(p k ) and 
r k to establish two types of confidence regions related to our estimate. The asymptotic 
theory of likelihood-ratio tests provides guidelines. A key technique is the application 
of Wilks's Theorem; see Ref. [5] and section 6.4 of Ref. [6]. Wilks's Theorem states that 
under appropriate assumptions, for two sets of models Ho C H specified by ho and h free 
parameters, respectively, the random variable D(H \X) = 2[L(H ML \X) — L(H 0> ml\X)] 
converges in distribution to x 2 (h ~ ho), the chi-squared distribution with h — ho degrees 
of freedom. Here, L(Ho,ml\X) and L(Hul\X) are the maximum log-likelihoods for H Q 
and H, respectively, and we assume that the true state is in the interior of H with 
respect to the parametrization. The parametrization must be sufficiently well-behaved; 
see the references above. We can apply this to hypotheses consisting of linear spaces of 
density matrices parametrized with respect to a linear basis, provided the true density 
matrix is not too near the boundary, that is, has no statistically near-zero eigenvalues. 

4. Point estimate stopping rule 

As a first approximation to be refined below, we intuit that little further information 
about ptme is obtained once L(pui\{xi}) -L(p k \{xi}) is below (L(p ML |X) - L(p tine \X)), 
where (.) is the expectation value for the enclosed random variable, and X is a random 
vector of length N distributed according to p t me- If Ptme is an interior point of the space 
of density matrices and N is sufficiently large, the expectation of L(p ML \X) — L(p true |X) 
can be approximated by an application of Wilks's Theorem. That is, let H consist of 
all density matrices of dimension d; H has d 2 — 1 free parameters. Let Ho contain only 
one element, p trU c- Then the random variable D(p tTue \X) = 2[L(p ML \X) — L(p tTUC \X)] 
converges in distribution to x 2 {d 2 ~~ !)• This distribution has expectation d 2 — 1, so 

(L(p ML \X) - L(p tme \X)) = l -{d 2 - 1). 

According to this intuition, one can stop iterations when r k is less than a fraction of 
(d 2 - l)/2. 

To make the above intuition more precise, a reasonable stopping rule can be based 
on the requirement that p k be in a confidence region for the true state at a reasonable 
level of significance s. Such a confidence region can be constructed from likelihood-ratio 
hypothesis tests with level of significance s. This confidence region is defined as the 
set of density matrices (or other parameters) p for which the observations {xj} and the 
associated likelihood ratio would not lead us to reject the hypothesis that p is p tr uc at 
level of significance s; see theorem 7.2 in ref. [6]. Here we reject the hypothesis that p is 
Ptruc if the observed log-likelihood difference D(p\{xi}) has a p- value less than s, where 
the p-value is the probability that the state would (if it were the true state) produce a 
value for D(p\X) at least the observed value. In general, p-values are associated with 
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a random variable, in this case D(p\X). For brevity, we omit mention of the random 
variable when the random variable is clear from context. According to Wilks's Theorem, 
we can calculate a state's p- value as the integral 

POO 

p-value = / f(u)du, 

JD{p\{xi}) 

where f(u) is the probability density function for x 2 {d 2 — 1)- Notice that smaller values 
for D(p\{xi}) correspond to larger p-values. Let t be the value of D(p\{xi}) that gives 
a p-value equal to s. That is, s = f{u)du. Thus the confidence region at level 
of significance s is {p : D(p\{xi}) < t}. We can ensure that our estimate p k is in 
a confidence region at a predetermined level of significance by stopping when r k is 
sufficiently small. The level of significance determines the statistical closeness of pk to 
Ptrue- Higher levels of significance imply closer p^. For example, a level of significance 
of 0.5 requires that r k is at most the median of the x 2 (d 2 — 1) distribution. The mean 
and variance of x 2 (f) distribution are / and 2f, respectively, and as / increases, x 2 (f) 
converges in distribution to a Gaussian with the given mean and variance [7]. Therefore, 
to ensure that p k is in a confidence region for the true state with s ~ 0.5, for large d, 
one may stop iterations when r k is below (d 2 — l)/2. 

5. State confidence region stopping rule 

Another potential use of pk is to construct a confidence region of states based on 
L(pk\{xi}). When determining confidence regions rather than a statistically good 
approximation of p t rue, it is not enough to ensure that pk is statistically close to 
Pml- Because pml is not known exactly, we construct a confidence region at level of 
significance s by replacing L(pMh\{xi}) in the conventional definition of the likelihood- 
ratio confidence region with the log-likelihood of our estimate L(p k \{xi}). As we explain 
below, this confidence region contains the conventional one. For the confidence region 
to be a good approximation of the maximum-likelihood confidence region requires that 
r k is less than a fraction of a/ (d 2 — l)/2, the standard deviation of x 2 (d 2 — 1). This 
rule ensures that approximate p-values computed according to 2[L(p k \{xi}) — L(p\{xi})] 
and the actual p-values computed with D(p\{xi}) are sufficiently close. As a numerical 
example, consider tomography of a d = 10 quantum system, where we wish to construct 
the confidence region at level of significance 0.32. In this case, ^(d 2 - l)/2 = 7.04, and 
the threshold for D(p\{xi}) is t — 105.04. Suppose we stop iterations when r k ^ 2. If 
we construct a confidence region as the set of p for which 2[L(p k \{xi}) — L(p\{xi})] < t, 
the region includes all p with p-values above 0.32, but may contain states with 
p-values as low as 0.23, because the true value of D(p\{xi}) can be as large as 
2[L(p k \{xi}) +r k — L(p\{xi})} = 105.04+ (2x2) for those states. Note that the p-values 
included in the region are not data dependent provided we choose the stopping rule 
beforehand. If we stop at r k ^ 1.5 and set the significance level at 0.05, corresponding 
to a threshold t = 123.22, the confidence region may contain states with p-values as low 
as 0.03. 
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6. Expectation value confidence interval stopping rule 

Another way to utilize tomographic data is to estimate expectation values, such as 
Ti(p tTUC A) and give a confidence interval for the estimate at a given level of significance 
s. Let F(f) = {p : Tr(pA) = /} be a level set for Tr(p true A). The dimensionality of this 
level set is d 2 — 2. Let 0ml,/ be the state in F(f) maximizing the likelihood. To establish 
a confidence region for / via a likelihood-ratio test, we use the statistic D((/)mlj\X). 
For each / there is an associated p-value (the p-value of the log-likelihood difference 
between p ML and 0ml,/) , and all /'s with p-value at least s are in the confidence region. 
By Wilks's Theorem, the statistic D(0ml,/|^O has distribution x 2 (l)- Let t be the 
maximum value of .D (0ml,/ 1-^0 f° r which / is a member of the confidence interval at 
significance level s. Following the discussion above, t is related to s through the integral 
of the x 2 (l) distribution. The confidence region for / is C = {/ : -D(0ml|{^}) It 
is necessary to adapt the stopping rule to the maximum-likelihood problem constrained 
to F(f). It is not practical to compute L(0ml,/) for all /, neither is it necessary to 
do so. We may use the Lagrange multiplier technique to compute L(0 M l,/)- With A 
as the Lagrange multiplier, we maximize K(p,X) = L(p) + ATr(pA). This function 
is still concave over the full space of density matrices and can be maximized by the 
same methods as L(p) after replacing R(p) with R(p) + XA. In the standard Lagrange 
multiplier technique, one usually solves an equation for the value of A that corresponds 
to the desired constraint /. Solving such an equation in this case would be difficult, and 
we do not know the desired / in advance. We need to approximate the values of / that 
are the limits of the confidence interval. To accomplish this, we choose a value for A and 
maximize K(p, A) to find a state 0ml,a that is the maximum-likelihood state obeying 
the constraint Tr(0ML,A^4) = fx, where f\ depends on the choice for A. If 0ml,a has 
the desired log-likelihood difference t, f\ marks one boundary of the confidence interval. 
If not, we search for the desired A by re-maximizing K(p, A) with different choices of 
A. This search is simplified by the observation that A is monotonically related to the 
log-likelihood of 0ml,a- This follows from concavity of the log-likelihood: The maximum 
log-likelihood L(f) on level set F(f) is a concave function of / and —A is the slope of 
L(f) at / = f x . 

Given an iterative method for maximizing K(p,X), the upper bound on log- 
likelihood derived from r k generalizes, yielding a bound r(0j) on the maximum possible 
increase in K(p, A) at the j'th iterate <f)j. Let fj = Tr(</>jA). Since K(p, A) is constant 
on level sets F(f), r(0 3 ) is a bound on L(0ml,/ j ) — L(<f)j). Given the iterate found after 
stopping, we can bound the true value of the desired log-likelihood difference by 

£>(0ML,/,|{x i }) > Ab = 2 [L{p k \{x t }) - L(0 i |{x i }) - r(0 i )] , 
J D(0ML,/ 3 |{x i }) < D uh = 2 [L(p k \{xi}) + r(p k ) - L(0 j |{x l })] . 

For a conservative approximation of the desired confidence interval, we run the iterative 
method with a stopping rule, seeking lower and upper bounds fj for which is close to 
t. To ensure a conservative estimate, it should be at least t. To avoid an unnecessarily 
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large confidence interval, we should ensure that r(p^) and r(4>j) are sufficiently small 
fractions of a/2, the standard deviation of x 2 (l)- F° r example, suppose that we wish to 
approximate a confidence interval at significance level 0.32. The threshold for -D(0ml,/j) 
is t = 0.99. Suppose that we stop iterations when r(p k ), r(</>j) ^ 0.3 and set the 
confidence interval according to {/ : D\\> < t}. Then the confidence interval includes 
all /'s with p-values larger than 0.32 and may contain /'s with p-values as low as 0.21. 
If we set the threshold at t — 3.84 according to a significance level of 0.05 and stop at 
r ki r { ( t ) j) ^ 0.2, the confidence interval may contain /'s with p-values as low as 0.04. 

7. Numerical simulation 

To illustrate the behaviour of the bound used for the stopping rules, we simulated 
homodyne measurements j8] of a state created by sending a superposition of optical 
coherent states (\a) + \ — a), unnormalized, with a = 1) through a medium whose 
transmissivity is 80 %. The homodyne measurements are 90 % efficient. The Hilbert 
space was truncated at 10 photons. Results are shown in Fig. [TJ where we have used 
the RpR algorithm to maximize the likelihood. To make this figure, we computed 1122 
iterations and assigned pml = Pii22- Further iterations suffered from numerical errors. 
After an initial phase of very fast likelihood increase, the convergence rate significantly 
drops. As expected, L(p ML ) — L(p k ) decreases with each iteration, but r k sometimes 
increases. There is a significant gap between r k and L(pml) — L(p k ), so it would be 
helpful to find tighter bounds to prevent unnecessary iterations. Perhaps the bound 
could be made more tight using a higher order expansion of the log-likelihood function 
in e. Without a reliable stopping rule such as one based on r k , a simple strategy is to 
stop iterations when the difference between successive density matrices obtained is very 
small. For comparison, the figure includes a plot of the trace distance between p k and 
Pk+i- According to the rough guidelines given above, if we want to use the result of this 
computation to obtain a confidence interval for an expectation value of the true state, 
we might halt iterations when r k < 0.1, at which point the trace distance between p k 
and pk+i is 3.6 x 10~ 7 . In general, the relationship between r k and trace distance is 
dependent on the situation. 

8. Conclusion 

Our bounds on the likelihood ratios hold regardless of Wilks's Theorem, but we 
have used Wilks's Theorem to construct the confidence regions described above. 
Wilks's Theorem must be applied carefully (if at all) when performing quantum state 
tomography. In particular the techniques discussed above cannot be used if the true 
state has eigenvalues that are statistically close to or if there is insufficient data 
for the limiting distributions to be good approximations. Both of these situations 
are common in applications of tomography and can result in bad confidence regions 
and excessive biases. For example, we encountered such difficulties analyzing the data 
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Figure 1. The left graph shows L(pml) — L(pk) and ru as a function of iteration 
number k for optical homodync tomography of a d = 11 (10 photon) dimensional 
quantum state. The right graph shows the trace distance between state pk and Pk+i- 
Trace distance was calculated as Tr[|pfe — p k+1 \]/2. 

reported in Ref. [9]; see the discussion in this reference's supplementary materials. When 
Wilks's Theorem cannot be applied, one must resort to other techniques such as the 
robust bounds on log-likelihood differences described in [TQl [EE] or parametric or non- 
parametric bootstrap [12] for estimating statistical errors and confidence regions. The 
bootstrap methods require running maximum-likelihood algorithms on many simulated 
or resampled data sets. Judicious use of one of the stopping rules given above can 
significantly reduce the number of iterations required when optimizing the likelihood, 
thereby making it possible to implement bootstrap with more resampled data sets to 
obtain better estimates. However, if bias in the maximum-likelihood estimate is large, 
confidence regions constructed by bootstrap may also be unreliable [T2] . 

We have presented an upper bound on the log-likelihood difference of the 
maximum-likelihood state and the currently found state in iterative algorithms for 
maximum-likelihood tomography. The bound is easily computed from the gradient 
of the log-likelihood function and can be used in stopping rules for confidence regions 
or decisions that use the likelihood ratio as a test statistic. 
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